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Abstract 

We study the setting in which the bits of an unknown infinite binary sequence x are revealed se- 
quentially to an observer. We show that very limited assumptions about x allow one to make successful 
predictions about unseen bits of x. First, we study the problem of successfully predicting a single from 
among the bits of x. In our model we have only one chance to make a prediction, but may do so at a 
time of our choosing. We describe and motivate this as the problem of a frog who wants to cross a road 
safely. 

Letting Nt denote the number of Is among the first t bits of x, we say that x is "e-weakly sparse" if 
lim inf (iVt /t) < e. Our main result is a randomized algorithm that, given any e-weakly sparse sequence 
x, predicts a of a; with success probability as close as desired to 1 — e. Thus we can perform this task 
with essentially the same success probability as under the much stronger assumption that each bit of x 
takes the value 1 independently with probability e. We apply this result to show how to successfully 
predict a bit (0 or 1) under a broad class of possible assumptions on the sequence x. The assumptions 
are stated in terms of the behavior of a finite automaton M reading the bits of x. 

We also propose and solve a variant of the well-studied "ignorant forecasting" problem. For every 
e > 0, we give a randomized forecasting algorithm that, given sequential access to a binary sequence 
X, makes a prediction of the form: "A p fraction of the next N bits will be Is." (The algorithm gets to 
choose p, N, and the time of the prediction.) For any fixed sequence x, the forecast fraction p is accurate 
to within ±e with probability 1 — e. 

1 Introduction 

1.1 The frog crossing problem 

A frog wants to cross the road at some fixed location, to get to a nice pond. But she is concerned about cars. 
It takes her a minute to cross the road, and if a car passes during that time, she will be squashed. However, 
this is no ordinary frog. She is extremely patient, and happy to wait any finite number of steps to cross 
the road. What's more, she can observe and remember how many cars have passed, as well as when they 
passed. She can follow any algorithm to determine when to cross the road based on what she has seen so 
far, although her senses aren't keen enough to detect a car before it arrives. 

Think of a "cai-stream" as defined by an infinite sequence of Os and Is describing the minutes when a car 
passes (we model time as discrete, and assume that at most one car passes each minute). We ask, under what 
assumptions on the car-stream can our frog cross the road safely? Obviously, if there is constant, bumper- 
to-bumper traffic — the car-stream described by the sequence (1, 1, 1, . . .) — then she cannot succeed, so we 
must make some assumption. 
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One natural approach to this kind of situation is to assume traffic is generated according to some prob- 
abilistic model. For example, we might assume that during each minute, a car arrives with probability .1, 
and that these events are independent. More complicated assumptions (involving dependence between the 
cai- anival-times, for example) can also be considered. 

However, the frog may not have a detailed idea of how the cars are generated. It may be that the frog 
merely knows or conjectures some constraint obeyed by the car-stream. We then ask whether there exists 
a strategy which gets the frog safely across the road (at least, with sufficiently high probability), for any 
car-stream obeying the constraint. This model will be our focus in the present paper. For example, suppose 
the cars appear to anive more-or-less independently with probability . 1 at each minute. The frog may be 
unsure that the independence assumption is fully justified, so she may make the weaker assumption that the 
limiting car-density is at most .1. 

Note that this condition holds with probability 1 if the cars really are generated by independent .1-biased 
trials, so this constraint can be considered a natural relaxation of the original probabilistic model. The frog 
may then ask whether there exists a strategy that gets her across the road safely with probability nearly .9 
under this relaxed assumption. (Happily, the answer is Yes; this will follow from our main result.) 

1.2 Relation to previous work 

Our work studies prediction under adversarial uncertainty. In such problems, an observer tries to make 
predictions about successive states of nature, without assuming that these states are governed by some 
known probability distribution. Instead, nature is regarded as an adversary who makes choices in an attempt 
to thwart the observer's prediction strategy. The focus is on understanding what kinds of predictions can be 
made under very limited assumptions about the behavior of nature. 

Adversarial prediction is a broad topic, but two strands of research are particularly related to our work. 
The first strand is the study of gales and their relatives. Gales are a class of betting systems generalizing 
martingales; their study is fundamental for the theory of effective dimension in theoretical computer science 
(see HHemOSI for a survey). The basic idea is as follows. An infinite sequence x is chosen from some known 
subset A of the space {0, 1}" of infinite binary sequences. A gambler is invited to gamble on predicting the 
bits of X as they are sequentially revealed; the gambler has a finite initial fortune and cannot go into debt. 
The basic question is, for which subsets A can the gambler be guaranteed long-term success in gambling, for 
any choice of x £ Al This question can be studied under different meanings of "success" for the gambler, 
and under more- or less-favorable classes of bets offered by the casino. 

Intuitively, the difficulty of gambling successfully on an unknown x £ Aisa. measure of the "largeness" 
of the set A. In fact, this perspective was shown to yield new characterizations of two important measures 
of fractal dimensionality. Lutz IILut03all gave a characterization of the Hausdorff dimension of subsets of 
{0, l}*^ in terms of gales, while Athreya, Hitchcock, Lutz, and Mayordomo IIAHLM07II showed a gale 
characterization of the packing dimension. These works also investigated gales with a requirement that the 
gambler follows a computationally bounded betting strategy; using such gales, the authors explored new 
notions of "effective dimension" for complexity classes in computational complexity theoryQ 

The second strand of related work is the so-called forecasting problem in decision theory (see IIDaw82ll 
for an early, influential discussion). In this problem, an infinite binary sequence x € {0, l}"^ is once again 

'Computationally bounded betting and prediction schemes have also been used to study individual sequences x, rather than 
sets of sequences. This approach has been followed using various resource bounds and measures of predictive success; see, 
e.g., |MF98.,Lut03bi . 
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revealed sequentially; we typically think of the t-th bit as indicating whether it rained on the t-th day at some 
location of interest. Each day a weather forecaster is asked to give, not an absolute prediction of whether it 
will rain tomorrow, but instead some estimate of the probability of rain tomorrow. In order to keep his job 
as the local weather reporter, the forecaster is expected to make forecasts which have the property of being 
calibrated: roughly speaking, this means that if we consider all the days for which the forecaster predicted 
some probability p of rain, about a p fraction turn out rainy (see IIFV98II for more precise definitions). 

In the adversarial setting, a forecaster must make such forecasts without knowledge of the probability 
distribution governing nature. In the well-studied "ignorant forecaster" model, the forecaster is allowed no 
assumptions whatsoever about the sequence x. Nevertheless, it is a remai^kable fact, shown by Foster and 
Vohra IIFV98I . that there exists a randomized ignorant forecasting scheme whose forecasts are calibrated in 
the limit. 

This result was extended by Sandroni IISan03ll . The cahbration criterion is just one of many conceivable 
"tests" with which we might judge a forecaster's knowledge on the basis of his forecasts and the observed 
outcomes. Foster and Vohra's result showed that the calibration test can be passed even by an ignorant 
forecaster; but conceivably some other test of knowledge could be more meaningful. A reasonable class of 
tests to consider are those that can be passed with some high probability 1 — e by a forecaster who knows the 
actual distribution V governing nature, for any possible setting of V. However, Sandroni showed that any 
such test can also be passed with probability 1 — e by an ignorant forecaster! Fortnow and Vohra IIFV09I 
give evidence that the ignorant strategies provided by Sandroni's result cannot in general be computed in 
polynomial time, even if the test is polynomial-time computablejl 

In both of the strands of reseaixh described above, researchers have typically looked for prediction 
schemes that have some desirable long-tenn, aggregate property. In the gale setting, the focus is on betting 
strategies that may lose money on certain bets, but that succeed in the limit; in the forecasting problem, 
an ignorant forecaster wants his forecasts to appear competent overall, but is not required to give definite 
predictions of whether or not it will rain on any given day. By contrast, in our frog problem, the frog wants 
to cross the road just once, and her life depends on the outcome. Our focus is on making a single prediction, 
with success probability as close to 1 as possible. 

In a later section of the paper we will also study a variant of the ignorant forecasting scenario. Follow- 
ing ||FV981[San03ll . we will make no assumption about the observation sequence x. Our goal will be to make 
a single forecast at a time of our choosing, of the following form: "A p fraction of the next N observations 
will take the value 1." We will seek to maximize the accuracy of our prediction, as well as the likelihood of 
falling within the desired accuracy. This forecasting variant is conceptually linked to our frog problem by 
its focus on making a single prediction with high confidence. 



1.3 Our results on the frog crossing problem 

We now return to our patient frog. 

To appreciate the kinds of frog-strategies that are possible, we first consider a simple but instructive 
example. Suppose the frog knows that at most one car will ever drive by. In this case, the frog might choose 
to wait until she sees a car pass; however, this strategy makes her wait forever if no car ever arrives, and we 
consider this a failure. Similarly, suppose the frog follows a deterministic strategy which, for some t > 1, 
makes her cross on the t-th minute if she has not yet seen a car. Then the frog is squashed on the car-stream 

"The tests considered in ISan03l|FV09l are required to hialt withi an answer in finite time. See IFV09I for references to work in 
which this restriction is relaxed. 
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consisting of a single cai^ passing at the t-th minute. Thus, any deterministic strategy fails against some 
car-stream obeying our constraint. 

What is left for the frog? We recommend following a randomized strategy. Fixing some 6 > 0, consider 
the following strategy: the frog chooses a value € {1, 2, . . . , [1/5]} uniformly at random, and crosses at 
time t*. Let's analyze this algorithm. Fix any car-stream consisting of at most one car, say arriving at time 
t > 1 (where i := oo if no car arrives). Then the strategy above fails only if t* = t, which occurs with 
probability at most [1/5]"^ < 6. 

Note that this error probability is over the randomness in the algorithm, not the car-stream; we regard 
the car-stream as chosen by an adversary who knows the frog's strategy, but not the outcomes of the frog's 
random decisions. We are interested in strategies which succeed with high probabihty against any choice by 
the adversary (obeying the assumed constraint). 

An easy modification of the above algorithm lets the frog succeed with probability 1 — 6 against a car- 
stream promised to contain at most M cars, for any fixed M < oo. However, it may come as a surprise that 
we can succeed given a much weaker assumption. The reader is invited to try the following puzzle: 

Puzzle 1. For any 6 > 0, give a frog-strategy that succeeds with probability 1 — 6, under the assumption 
that the number of cars is finite. 

The assumption can be weakened further. Fixing a car-stream, let Nt denote the number of cars appear- 
ing in the first t minutes. Say that the car-stream is sparse if Nt = o{t), that is, if 

lim Nt/t = 0. 



Puzzle 2. Give a frog-strategy that succeeds with probability 1 — 6, under the assumption that the car-stream 
is sparse. 

Note that in Puzzle[2l the frog is promised that the fraction Nt/t approaches as f — >^ cx), but she has no 
idea how quickly it will do so. 

Say the car-stream is weakly sparse if Nt ^ ^{t), that is, if 



Puzzle 3. Give a frog-strategy that succeeds with probability 1 — 6, under the assumption that the car-stream 
is weakly sparse. 

In this paper we provide a solution to Puzzle [3] This immediately implies a solution for Puzzles [T] and |2j 
but these first two puzzles also have simpler solutions, which we encourage the reader to find. The basic 
idea of our solution to Puzzle |3] is easy to state: roughly speaking, seeing fewer cars increases the frog's 
"courage" and makes her more likely to decide to cross. Correctly implementing and analyzing this idea 
turns out to be a delicate task, however. 

We actually prove a quantitative strengthening of Puzzle |3] For any e > 0, say that a car-stream is 
e-weakly sparse if 



Our main result is that, under the assumption that the car-stream is e-weakly sparse, the frog can cross 
successfully with probability as close as desired to 1 — e. We state our result formally in Section |2] after 
setting up the necessary definitions. 




lim inf Nt/t = 0. 
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Our result bears some resemblance to known results in dimension theory. Let Ai^^^s ^ {0, l}"^ denote 
the set of e- weakly sparse infinite binary sequences. Eggleston |Egg49[ 1511651 showed that for e < 1/2, the 
Hausdorff dimension of A^^-^is is equal to the binary entropy H{e). More recently, Lutz IILut03all gave an 
alternative proof using his gale characterization of Hausdorff dimension (Lutz also calculated the "effective 
dimension" of Ai^^^s according to several definitions). Lutz upper-bounds the Hausdorff dimension of 
by giving a gale betting strategy that "succeeds" (in the appropriate sense) against all x € A^^^s- 
This betting strategy, which is simple and elegant, does not appear to be applicable to our puzzles. Indeed, 
a major difference between our work and the study of gales is that gale betting strategies are deterministic 
(at least under standard definitions IILut03 a[ [AHLM071 ) . whereas randomization plays a crucial role in our 
frog-strategies. 

1.4 Further results 

In SectionlH we prove an extension of the result of Puzzle[3l in a a modified setting in which we are allowed 
to predict either a or a 1. We give a condition on the binary sequence x that is significantly more general 
than weak sparsity, and that still allows a bit to be predicted with high confidence. The condition is stated in 
terms of a finite automaton M that reads x: we assume that x causes M to enter a designated set of "bad" 
states B only infrequently. A certain "strong accessibility" assumption on the states B is needed for our 
result. 

In SectionlH we study a problem closely related to the "ignorant forecasting" problem discussed earlier, 
where (as in the frog problem) a single prediction is to be made. In the "density prediction game," an 
arbitrary infinite binary sequence is chosen by Nature, and its bits are revealed to us sequentially. Our goal 
is to make a single forecast of the form 

"A p fraction of the next N bits will be Is." 

We are allowed to choose p, N, and the time at which we make our forecast. 

Fixing a binary sequence x, we say that a forecast described by (p, N), and made after viewing xt, is 
e-successful on x if the fraction of Is among xt+i , • • • , xt+ n is in the range {p — e,p + e). For 6,e > 0, we 
say that a (randomized) forecasting strategy S is {6, e)-successful if for every x G {0, 1}'^, 

Pr[5 is e-successful on x] > 1 — 5. 

In Section [21 we show the following, perhaps surprising, result: 

Theorem 1. For any (5, e > 0, there exists a ((5, £)-successful forecasting strategy. 

2 Preliminaries and the Main Theorem 

First we develop a formal basis to state and prove our main result. N = {1, 2, . . .} denotes the positive whole 
numbers. For € N, [N] denotes the set {1, 2, ... , N}. {0, 1}'^ denotes the set of all infinite bit-sequences 
h = 62, • • •)• We will freely refer to any such sequence as a "car-stream," where hi = \ means "a car 
appears during the i-th minute." 

A. frog-strategy (or simply strategy) is a collection 

5 = {vrs.fe : 6 G {0, in, 
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where each vr^ is a probabihty distribution over N U {oo}. We require that for all 6 = (61, 62, . . .), 6' = 
(6;,6'2,...),andalH e N, 

(61, ... , bi-i) = {b[,..., 7Ts,b{i) = 7r5,b'(i). (1) 

That is, TTs,b{i) depends only on 61, ... , 

Let us interpret the above definition. A frog-strategy defines, for each car-stream b and each i G N, a 
probability vr^ ^(i) that, when facing the car-stream b, the frog will attempt to cross at step i. There is also 
some probability vr^ ;,(oo) that the frog will wait forever without crossing. Whether it lives or dies, the frog 
only attempts to cross at most once, so these probabilities sum to 1. Eq. ([T]) requires that the frog's decision 
for the i-th minute depends only upon what it has seen of the car-stream during the first {i — 1) minutes. 
The frog-strategies we analyze in this paper will be defined in such a way that Eq. ([B obviously holds. 

Given a frog-strategy S, define the success probability 

Suc(5, 6) := ^ T^Sfiii) 

im:bi=0 

as the probability that, facing b, the frog crosses successfully during some minute when there is no car. 
Similarly, define the death probability 

DP(5, b) := Yl ""sMi) = 1 - Suc(5, b) - 7rs,b{oo) 

ieN:b,=l 

as the probability that the strategy S leads to the frog being squashed by a car on car-stream b. For a subset 
A C {O,!}"^, define 

Suc(5,j4) := inf Suc(iS,6). 
Let Nt = Nt{b) := 61 + . . . + 6j. A car-stream b is called e-weakly sparse if 

lim (mi Nt/t] < e. 

s— i>oo \t>s J 

We can now formally state our main result: 

Theorem 2. Fix £ € (0, 1) and let := {b :b is e— weakly sparse^. Then for all 7 > 0, there exists a 

strategy S^^-y such that 

Suc(5e,^, Ai._ws) > 1 - e - 7. 

Furthermore, Ss,j has the following "safety" property: for any car-stream b £ {0, 1}'^, the death probability 
DP (5, b) is at most e + 7. 

Note that b is weakly sparse (as defined in Section 11.31 ) exactly if it is e-weakly sparse for all e > 0. 
Thus if Aws := {b : b is weakly sparse}, then by Theorem |2j we can succeed on A^jg with probability as 
close to 1 as we desire. This solves Puzzle 3. 

It is not hard to see that Theorem|2]is optimal for frog-strategies against Ai^^ws- For consider a randomly 
generated car-stream b where the events [bi = 1] occur independently, with E[6j] = min{l, e + 2~*}. Then 
[limt-).oo Nt{b) / 1 = e] occurs with probability 1. On the other hand, any frog-strategy S has success 
probability less than 1 — e against b. Thus, for any S we can find a particular cai^-stream b for which 
limt_!.oo Nt{b)/t = e and which causes S to succeed with probability less than 1 — e. 
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3 Proof of the the Main Theorem 



In this section we prove Theorem |2l First we observe that, if we can construct a strategy S such that 
Suc(5, Ai;-ws) > 1 — e — 7, then the "safety" property claimed for S in the theorem statement will follow 
immediately. For suppose to the contrary that some car-stream b € {0, l}"^ satisfies DP(5, b) > e + 7. 
Then there exists m € N such that Yli<m-b =1 '^•S.feC^) > e + 7- If we define b' € {0, l}"^ by 



b':-- 



bi if i < m, 
if i > m, 



then b' G A^^^s and DP(5, b') > e + 7, conti"adicting our assumption on S. 

To construct the strategy S, we use a family of frog-strategies for attempting to cross the road within a 
finite, bounded interval of time. The following lemma is our key tool, and is interesting in its own right. 

Lemma 1. For any 6 € (0, 1) and integer K > 1, there exists a strategy T = Tk,5 such that for all 
b e {0, i}'^.- 

(i) The crossing time ofT is always in \K] U {00}. TTzaf is, for K < i < 00, we have Tr-i-^b{i) = 0; 

(ii) + . . . + bK-i)/{K -l)<5' <5, then 7rr,b(oo) < 1 - ^{{5 - 5'f/5); 
( Hi) The death probability satisfies 

c f h 

DP(r, h) < Suc(r, b) + 



1-6 ^' ' ^ ' \{1-6)K^ 

We defer the proof of Lemma[TJ and use it to prove Theorem |2l 

Proof of Theorem^ Fix settings of e, 7 > 0; we may assume e + 7 < 1, or there is nothing to prove. Let 
ei := e + 7/3, £2 := e + 27/3. We also use a large integer K > 1, to be specified later. Divide N into a 
sequence of intervals /i = {1, 2, . . . , K}, I2 = {K + 1, . . . , 5K}, and so on, where Ir has length r^K. 

Let S = Sg^^f be the frog-strategy which does the following: first, follow the strategy Tk,£2 (as given by 
Lemma [T]) during the time interval Ii. If no crossing is attempted during these steps, then run the strategy 
TiK,e2 "^he interval I2, after shifting the indices of I2 appropriately (so that TiK,e2 considers its input 
sequence to begin on bx+i)- Similarly, for each r > 0, if we reach the interval without an attempted 
crossing, we execute the strategy T^^k £2 the interval I,-, after shifting indices appropriately. 

We will show that if K is sufficiently large, we have Suc(5, Ag-^^) > 1 — e — 7 as required. Fix any 
b = (&i, 62, • • •) £ ^e-ws- Let Ur := (Z^jg/^ h) /\Ir\ be the fraction of 1-entries in b during interval Ir- 

Claim 1. For infinitely many r, < ei. 

Proof. Suppose to the contrary that > e\ when r > R. Consider an interval {1,2,..., M} large enough 
to properly contain /i, /2, • • • , Ir- Let t > R be such that It C [M] but that It+i ^ [M]. Let a* be the 
fraction of 1-entries in [M] n It+i; we set a* := if [M] n It+i = 0. With Nm = (61 + . . . + &a/), we 
have the expression 

Nm ^ \Ir\ , \[M]nlt+i\ ^ 

= > ■ Or + - — ■ a 

M ^ M M 

r<t 
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giving the cai^-density (fraction of Is) of 6 in [M] as a weighted average of the cai^-densities in Ii, It and 
in [M] n It+i. 
Note that 

M - M -Er<tr'K 

as M — )■ oo. Now Ur > ei when r > R, so for sufficiently large M we have Nm/M > {ei + e)/2 > e. 
But this contradicts the fact that b € A^^^s, proving the Claim. □ 

Fix r > 0. If Ir = {j, . . . ,k} and Ur = (6j • + • • • + bk)/{k — j + 1) < ei, then we also have 
(6j + . . . + bk_i)/{k — j) < e + 7/2 if r is large enough. For any such r, condition (ii) of Lemma [J 
tells us that if our frog-strategy reaches the interval /,., it will attempt to cross during Ir with probability 
^((7/6)^/^2)- There ai^e infinitely many such r, by Claim [T] Thus, the frog-strategy eventually attempts to 
cross with probability 1. It follows that DP(5, 5) = 1 — Suc(5, b). 

For r > 0, let Pr = Pr (6) be defined as the probability that S reaches If without attempting to cross 
earlier. Let b{Ir] denote the sequence b, shifted to begin at the first bit of I, - Then we can reexpress the death 
probability of S on b, and bound this quantity, as follows: 

DP(5, b) = Y^ Pr ■ BPiTr2K,e„b[Ir]) 
r>l 



r>l 

(by condition (iii) of Lemma [l]) 

= Y^^(^^Pr-Snc{Tr2K,e,Mlr])^ + O (^J^ 

(using the fact that < cxd) 



£2 



£2)K 



r>0 



■Suc(5,6) + 0' 



1-82 ' ' \{l-e2)K 
Thus, DP(5, 6) = 1 - Suc(5, b) < ^ Suc(5, b) + O ( (T^ji? j , which implies 

Suc(5, 6) > 1 - £2 - O ( t-^^-tfI = 1 - (e + 27/3) - O ^ 



)KJ ' " ' \{\-e2)K 

By setting K ^ £27^^(1 — ^2)"^ sufficiently large, we can conclude Suc(5, 6) > 1 — e — 7, where the 
slack in the inequality is independent of the choice of 6 € A^r^ws- This proves Theorem [2l □ 

Proof of Lemma^ By an easy approximation argument, it suffices to prove the result for the case when 5 is 
rational. So assume 

b = p/d, 

for some integers < p < d, and let 

q := d — p. 
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The frog-strategy T is as follows. First, pick a value t* G [K] uniformly at random. Do not attempt to 
cross on steps 1, 2, . . . , — 1. During this time, maintain an ordered stack of "chips," initially empty. For 
1 < i < t*, after viewing hi, if hi = then add p chips to the top of the stack; if 6j = 1 then remove q 
chips from the top of the stack — or, if the stack contains fewer than q chips, remove all the chips. After this 
modification to the stack, we say that the bit 6/ has been "processed". 

For < i < K — 1, let Hi denote the number of chips on the stack after processing 6i, . . . , 6j (so, 
Hq = 0). After processing bt*-i, sample from a 0/1-valued random variable X, with expectation 



(Note that this expectation is at most ^^^^^^ < 1, so the definition makes sense.) Attempt to cross at step 
t* if X = 1, otherwise make no crossing attempt at any step. 

Note that the variable Ht can be regarded as a measure of the frog's "courage" after processing bi, . . . ,bt, 
as in our sketch-description in Section [T31 We now verify that T has the desired properties. Condition (i) 
in Lemma[T]is clearly satisfied. Before verifying conditions (ii) and (iii), we first sketch why they hold. For 
(ii), the idea is that if much less than a 6 fraction of fei, . . . , fe^-i are Is, then the stack of chips will be of 
significant height after processing these bits. Since the stack doesn't grow too quickly, we conclude that 
the average stack height during these steps is significant, which implies that the frog attempts to cross with 
noticeable probability. 

For (iii), the idea is that for any chip c, if c stays on the stack for a significant amount of time, then 
the fraction of Is appearing during the interval in which c was on the stack must be not much more than 6. 
Thus c's contribution to the death probability is not much more than 6 /{I — 6) times c's contribution to the 
success probability. On the other hand, chips c which don't stay on the stack very long make only a small 
contribution to the death probability. 

Now we formally verify condition (ii). Fix some sequence b. First note that the placement and removal 
of chips, and the height sequence Hq, . . . , Hk^i, can be defined in terms of b alone, without reference to 
the algorithm's random choices. Throughout our analysis we consider the stack to continue to evolve as a 
function of the bits bi, . . . , bx-i, regardless of the algorithm's choices. 

Suppose 6i + . . . + < 5'{K — 1), where 5' < 6; we ask, how large can Tr-j-^h^oo) be? From the 

definition of T, we compute 



Now, for a chip c, let rUc G N denote the number of indices i < K for which c was on the stack immediately 
after processing bi. (We consider each chip to be "unique;" that is, it is added to the stack at most once.) We 
can reexpress the sum appearing in Eq. (O as 



We will lower-bound this sum by considering the contribution made by chips that are never removed from 
the stack — that is, chips which remain after processing fe^-i- We call such chips "persistent." First, we 
ar^gue that there are many persistent chips. By our assumption, at least p ■ {1 — d'){K — 1) chips are added 




0<t<K c 
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to the stack in total, while at most q • S'{K — 1) chips are ever removed. Thus the number of persistent chips 
is at least 



p{l - 5'){K - 1) - q5'{K -l)=p{l-5 + {5 - 5')){K - 1) - q{5 + {5' - 5)){K - 1) 

= b(l -5)-q6 + {p + q){5- 5')]{K - 1) 

=0 =d 

= {6 -6')d{K -1), 

where we used p/q = 6/{l - 6). Let J ■= (S - 5')d{K - 1). 

Pick any J persistent chips, and number them c(l), . . . , c( J) so that j' < j < J implies c(j') appears 
above c{j) on the stack after processing bx-i- This means c{j') was added to the stack no earlier than c(j), 
so that m^f^ji-^ < m^f^jy At most p chips are added for every processed bit of b, and if c(j) was added while 
processing the {K — i)-th bit, then mc(j) = i. Thus, by our indexing we conclude ?Ttc(j) > \j/p] > j/p- 
Summing over j, we obtain 

J 

persistent c j=l 

_ J(J+1) 



> 



2p 

{5-6'f<f{K-lf 
2p 

{5 - 6')^d{K - 1)^ 
26 ■ 



Finally, returning to Eq. Q, we compute 

-r.(oo) = l-^5:m.<l--^ 26 <^ 



c 



since K > 1. This establishes condition (ii). 

Now we verify condition (iii). Fix any car-stream b. From our definitions, we have the expressions 

^-(^■'') = ^ E % DP(5,.) = 1 E 

t£[K]:bt=0 t&[K]:bt=l 



DP(5,6) - ip/q) Suc(5,6) = ^ ( Yl - E I ■ (3) 

yte[A']:bt = l t6[/^]:fet=0 / 

We regard the quantity Ht-i as being composed of a contribution of 1 from each of the chips on the stack 
after processing bt-i. We rewrite the right-hand side of Eq. Q as a sum of the total contributions from each 
chip. For a chip c, and for z G {0, 1}, let 

nc,z '■= \{t ^ [K] : bt = z, and c is on the stack immediately after processing . 
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We then have 

DP(5, b) - (p/q) Suc(5, b) = ^ - (p/g)n,,o). (4) 

c 

Fix attention to some chip c, which was placed on the stack while processing the z^-th bit, for some 
ic S ~ 1] ■ First assume that c was later removed from the stack, and let jc € [K — 1] be the index of the 
bit whose processing caused c to be removed (thus, bj^ = 1). Then the stack was not empty after processing 
bits ic, ■ ■ ■ ,jc — 1> since in particular, the stack contained c. Thus each 1 appearing in . . . bj^^i) 

caused exactly q chips to be removed from the stack. The removal caused by [bj^ = 1] removes some 
number rc < q of chips. Also, each appearing in the same range causes p chips to be added. Now 
iT'cfl^ iT'c,! count the number of Os and Is respectively among . . . , bjj. Thus we have 

Hjc - = pricfi - q{nc,i - 1) - < pucfl - q{nc,i - 1), 

or rean-anging, 

nc,i - {p/qWfl < {H,^ - HjJ/q + 1. (5) 

The chip c is added to the stack with p — 1 other bits while processing bit ic- Later, c is removed from the 
stack when processing bit jc, along with at most q — I other chips. Thus we have 

Hi^-Hj^<p + q-l, 

and combining this with Eq. Q gives 

nc,i - {p/q)nc,o < {p + q - l)/q + l<p/q + 2. (6) 

Next suppose c was added after processing bit ic G [ii' — 1], but never removed from the stack. 
Then the stack was nonempty after processing bit ic and remained nonempty from then on, so each 1 in 
. . . , &A'_i caused exactly q chips to be removed. By reasoning similar to the previous case, we get 

nc,i - {p/q)ncfi = {Hi^ - HK~i)/q. 

Now, c was added along with p — 1 other chips after processing bi^, and c remains on the stack after 
processing b^-i- It follows that Hi^ — Hk-i < p — 1, so 

nc,i - ip/q)nc,o < {p - (7) 
Plugging Eqs. (O and © into Eq. (01), we bound 

DP(5, b) - {p/q) Suc(5, 6) < ^ Y.^p/q + 2)< 



c 



(since at most p{K — 1) chips are ever used) 

1 / p p 2p 
^ K \q' 1^ 



1 



K \ l-6 
6 



6 + 26 



O 



,(1 -'5)^. 

Since (p/q) = 6/{l — 6), this establishes condition (iii), and completes the proof of Lemma[T] □ 
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4 More on Bit-Prediction 



After thinking hard about car-streams and getting across the road safely, our frog had developed a taste 
for prediction. In this section we present an extension of Puzzle [3] that is able to predict single bits from 
significantly more general classes of binary sequences. 

4.1 Bit-prediction algorithms 

Our result concerns the setting in which an observer is asked to correctly predict a single bit of their choice 
from a sequence x. Unlike the frog crossing problem, in which the frog needed to correctly predict a 0, in 
this problem the algorithm is allowed to predict either a or a 1 . Thus we need to modify our definition of 
frog-strategies (in the obvious way), as follows. A bit-prediction strategy is a collection 

5 = {^5,6 : & G {0, in, 

where each vr^ is now a probability distribution over (N x {0, 1}) U {oo}. We require that for all b = 
62, ■■■),b' = {b[,b'^, . . .), and alH G N, 2 G {0, 1}, 

(61, ... , = (6'^, . . . , b'i^i) =^ 1Ts^b{{i, Z)) = TTSfi'dh z)). 

That is, TTsfiiih z)) depends only on 61, ... , bi^i. As in the frog-crossing setting, our bit-prediction strate- 
gies will be defined so that this constraint clearly holds. 
Define the success probability 

as the probability that S correctly predicts a bit of b. For a subset A C {0, 1}'^, define Suc''''"P'''"^(5, A) := 
infbeASuc'"'-P-'^(5,6). 

4.2 Finite automata 

To state our result, we need the familiar notion of a. finite automaton over a binary alphabet. Formally, this 
is a 3-tuple M = {Q, s, A), where: 

• Q is a finite set of states; 

• s G Q is the designated starting state; 

• A : Q X {0, 1} — 7> Q is the transition function. 

For q ^ Q, B (1 Q, say that B is accessible from q if there exists a sequence yi, ■ ■ ■ ,ym of bits and a 
sequence qo = q,qi, ... ,qrn of states, such that 

1. A{qi, yi+i) = qi+i for i = 0, 1, . . . , jtt, - 1; 

2. qm G B. 
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Say that B is strongly accessible if, for any state q that is accessible from the starting state s, B is accessible 
from q. 

Finite automata operate on infinite sequences x € {0, l}"^ as follows: we let qQ{x) := s, and inductively 
for t > 1 we define 

qt{x) := A{qt^i{x),xt). 

We say that qt{x) is the state of M after t steps on the sequence x. 
For a state q ^ Q we define Vq{x), the visits to q on x, as 

Vg{x) := {t > : qt{x) = q}. 

Similarly, for B Q Q, define Vb(x) as Vb{x) ■= {t > : qt{x) G B). 

4.3 Statement of the result 

Say we are presented with the bits of some unknown x G {0, l}"^ sequentially. We assume that x is "nice" 
in the following sense: for some known finite automaton Af, there is a set B C Q of "bad" states of M, 
which we assume M visits only infrequently when M is run on x. We show that, if B is strongly accessible, 
we can successfully predict a bit of x with high probability. 

First recall the definition of weak sparsity from Section [T31 We say that a subset S C {0, 1, 2, . . .} is 
weakly sparse if its characteristic sequence is weakly sparse. We prove: 

Theorem 3. Let M = (Q, s, A) be a finite automaton, and let B Q Q be a strongly accessible set of states. 
Define 

Ab,ws '■= G {0, l}"^ : Vb{x) is weakly sparse}. 
Then for all e > 0, there exists a bit-prediction strategy 3 = 3^ such that 

We make a few remarks before proving Theorem [3l First, simple examples show that the conclusion of 
Theorem[3]can hold even in some cases where B is not strongly accessible. Finding necessary and sufficient 
conditions on B could be an interesting question for future study. 

Second, it is natural to ask whether a more "quantitative" version of Theorem [3] can be given. Let 
Ab^s-ws be the set of sequences x for which the characteristic sequence of Vb{x) is e-weakly spai^se (as 
defined in Section |2l). If B is strongly accessible then, by a slight modification of our proof of Theorem [3l 
one can derive a bit-prediction strategy S such that 

Suc''"-P'-'='^(5,^B,,_^,) > 1 - O (fei/^) , 

where £ = \Q\is the number of states of the automaton M. 

Something like this weak form of dependence on e is essentially necessary, as can be seen from the 
following example. Let M be an automaton with states Q = {1,2, . . . ,£}, and define 

A(i,l) := min{i + 1,^}, A(i,0):=l. 

Let B := {£}, and consider running M on a sequence b of independent unbiased bits. Then with probability 
1, Vb(6) is 2~^+^-weakly sparse. On the other hand, no algorithm can predict a bit of b with success 
probability greater than 1/2. 



13 



4.4 Proof of Theorem |3] 



Let A^s C {0, l}'^ denote the set of weakly sparse sequences. Given a sequence x = (xi, X2, • • ■), define 
-ix := (-1X1, -1X2, . . .)■ Say that x is co-weakly sparse, and write x G Aco-ws, if G A^^. To prove 
Theorem[3l we need two lemmas. The following lemma follows easily from Theorem|2l 

Lemma 2. Given 6 > 0, there exists a bit-prediction strategy V = Vs such that 

S^^bit-pred(p^ U A,o-n^s) > I - S. 

V also has the "safety" property that for any x G {0, 1}'^, the probability that V outputs an incorrect 
bit-prediction on x is at most 5. 

Proof. First, note that a frog-strategy (as defined in Section |2l) can be regarded as a bit-prediction strategy 
that only ever predicts a 0. Let e = 7 := 5/4. The bit-prediction strategy V, given access to some sequence 
h, simulates the frog-strategy Se^-^ from Theorem|2]on b, and simultaneously simulates an independent copy 
of Se^'y on -16. If (6) ever outputs a prediction (i.e., that the next bit of b will be 0), V immediately outputs 
the same prediction. On the other hand, if ever outputs a prediction (that the next bit of -16 will be 

0), then V predicts that the next bit of b will be 1. If both simulations output predictions simultaneously, V 
makes an arbitrary prediction for the next bit. 

To analyze V, say we are given input sequence b € A^s U Aco-ws- First suppose b G A^s- Then 
outputs a coiTcct prediction with probability > 1 — e — 7. Also, by the safety property of 5^^-^ shown in 
Theorem|2l the probability that 5^ ,y(-i6) outputs an incorrect prediction about -16 is at most e + 7. Thus the 
probability that V outputs a correct prediction on b is greater than 1 — 2e — 27 = 1 — 5. 

The case where b G Aco-ws is analyzed similai^ly. Finally, the safety property of V follows from the 
safety property of Se^-y- □ 

For the next lemma, we need some further definitions. Fix a finite automaton M = {Q, s, A). For 

X G {0, 1}'^, let 

<5inf(a;) := {q £Q : \Vg{x)\ = 00}. 

Of course, Qini{x) is nonempty since Q is finite. If g G Qinf{x), define a sequence x^'^) G {0, l}*^ as follows. 
If Vq(x) = {t(l),t(2),... ,} where < < t{2) < . . ., we define 

(9) _ 
Xi :— Xj(j)+i. 

In words: if M is run on x, the f-th bit of x^*^) records the bit of x seen immediately after the i-th visit to 
state q. If q ^ Qmi{x), we define x^'^^ G {0, 1}* similarly; in this case, x\''^ is undefined if M visits state q 
fewer than i times while running on x. 

The following lemma gives us a useful property obeyed by sequences x from the set A b,ws (defined in 
the statement of Theorem |3]). 

Lemma 3. Given M = (Q, s, A), suppose B Q is strongly accessible. If x G Ab^ws, then there exists a 
state q G Qm{x) such that 

■r(l) ^ A \ I A 
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Proof. We prove the contrapositive. Assume that all q G Qmi{x) satisfy x^'^^ ^ A^s U Aco-ws\ we will 
show that X ^ Ab^ws- 

Say that a state q ^ Q is frequent (on x) if there exist a, /3 > such that for all T G N, 

\Vg{x)n{0,l,...,T-l}\>aT-/3. 

Let F denote the set of frequent states. Clearly F C Qini{x). We will show: 

1. F = Qm(x); 

2. F contains a state from S. 



Item 2 will immediately imply that x ^ ^B,tos, as desired. 

For each q € Qmf{x), our assumption x*^"^^ ^ A^g U ^co-tos implies that there is a 5g e (0, 1/2) and a 
Kg > such that for k > Kg, 

6g<l[x[^^+...+X^^)<l-6g. (8) 

Let 6 := minJg. Choose a value T* > such that each q G (5inf(x) appears at least Kg times among 
(90(2;); 91(2;), • • • , qT*-i{x)). Choose a second value R > 0, such that any q ^ Qmi{x) occurs fewer than 
R times in the infinite sequence iqo{x),qi{x), . . .). 
Let £ = \Q\. Fix any t e N satisfying 

By simple counting, some q* ^ Q occurs at least t/i times in {qo{x), qi{x), . . . , qt^i{x)). We have 
t/i > R, so this q* must lie in Qinf{x). Eq. dUl then implies that the states A(g*, 0), A(g*, 1) each ap- 
pear at least 6t/i — 1 > 6'^t/£ times among {qo{x),qi{x), . . . ,qt^i{x)). Now 5'^t/£ > R, so we have 
A(g^O),A(g^l) eQi„f(x). 

Iterating this argument {£ — 1) times, we conclude that every state q reachable from q* by a se- 
quence of {£ — 1) or fewer transitions lies in Qm{x), and appears at least 6'^^^~^h/£ = Q{t) times among 
{qo{x),qi{x), . . . , qt-i{x)). But every q G Qini{x) is reachable from q* by at most {£ — 1) transitions. Thus 
F = Qmf{x), proving Item 1 above. 

The argument above shows that if q G Qmf{x), then A(g, 0), A{q, 1) e Qm{x) as well. Recall that 
B is strongly accessible; it follows that Qiai{x) n i? is nonempty, proving Item 2 above. This proves 
Lemma |3] □ 

We can now complete the proof of Theorem [3l Let Q = {pi, . . . ,p£}, where £ = \Q\. We may assume 
£ > 1, for otherwise Ab,ws = and there is nothing to show. Given e > 0, let 5 := e/{2£). We define the 
algorithm 5 = 5^ as follows. S runs in parallel £ different simulations 

V[l], . . . ,V[£] 

of the algorithm Vs from Lemma |2l Vlj] is run, not on the input sequence x itself, but on the subsequence 
To determine which simulation receives each successive bit of x, the algorithm S simply simulates 
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M on the bits of x seen so far. (Note that, if pj ^ Qmf{x), then the simulation Vlj] may "stall" indefinitely 
without receiving any further input bits.) 

Suppose that the simulation V[j] outputs a prediction z G {0, 1} after seeing the i-th bit of x^p^\ and 

that we subsequently reach a time t such that qtix) = pj is the (i + l)-st visit to state pj. The algorithm S 

(p.) 

then predicts that xt+i = xl_^-[ = z. 

We now analyze S. Fix any x G Ab-ws- By the safety property of Lemma |2j each V\j\ outputs 
an incorrect prediction with probability at most b, so the overall probability of an incorrect prediction is 
at most ib = e/2. Also, since x G Ab^ws, Lemma |3] tells us that there exists a pj G Qmi{x) such that 
G A^^ U A 

co~ws- Thus, if T'lj] is run individually on x^^^' , 'P\j\ outputs a correct prediction with 
probability greater than 1 — 5. We conclude that 

g^pbit-pred ^) > (1 _ 5) _ g/2 > 1 - e, 

using £ > 1. This proves Theorem [3] 

5 The Density Prediction Game 

In this section we prove Theorem [U from Section [L4l The proof uses a technique from the analysis of mar- 
tingales that seems to be folklore; my understanding of this technique benefited greatly from conversations 
with Russell Impagliazzo. 

For any fixed b, e, our prediction strategy will work entirely within a finite interval (xi, . . . , xt) of the 
sequence x. We note that, to derive a (5, e)-successful strategy over this interval, it suffices to show that for 
every distribution V over {0, 1}^, there exists a strategy S-p that is (5, e)-successful when played against V. 
This follows from the minimax theorem of game theory, or from the result of Sandroni USanOSII mentioned 
in Section 11.21 However, this observation would lead to a nonconstructive proof of Theorem [T] and in any 
case does not seem to make the proof any simpler. Thus we will not follow this approach. 

Let (5, e > be given; we give a forecasting strategy S = Ss^e for the density prediction game, and prove 
that S is {b, e)-successful. Set n := [4/ (5e^)] . Our strategy will always make a prediction about an interval 
Xa, ■ ■ ■ ,Xb where a < b < 2". The strategy S is defined as follows: 

1. Choose ii G {1, . . . , n} uniformly. Choose S uniformly from {1, . . . , 2"~^}. 

2. Ignore the first t = (5 — 1) • 2^ bits of x. Observe bits xt+i, . . . , 3;^^2«-i' let p be the fraction of 
Is in this interval. Immediately after seeing x^_|.2ii-i, predict: 

"Out of the next 2^^^ bits, a p fraction will be Is." 

We now analyze S. To do so, it is helpful to describe 5 in a slightly different fashion. Let us re-index 
the first 2" bits of our sequence x, considering each such bit to be indexed by a string z G {0, 1}". We use 
lexicographic order, so that the sequence is indexed xq", Xgn-ii, Xgn-aj^O' ^^id so on. 

Let r be a directed binary tree of height n, whose vertices at depth i (0 < i < n) are indexed by binary 
strings of length i; in particular, the root vertex is labeled by the empty string. If i < n and y G {0, 1}*, the 
vertex Vy has left and right children Vyo,Vyi respectively. Each leaf vertex is indexed by an n-bit string z, 
and any such vertex Vz is labeled with the bit Xz- 

For y G {0, 1}*, let Ty denote the subtree of T rooted at Vy. A direct translation of the strategy S into 
our current perspective gives the following equivalent description of S: 
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r. Choose R € {1, . . . , n} uniformly. Starting at the root of T, take a directed, unbiased random walk 
of length n — R, reaching a vertex vy where Y G {0, 1}"~^. 

2'. Observe the bits of x that label leaf vertices in Tyo, and let p be the fraction of Is seen among these 
bits. Immediately after seeing the last of these bits, predict: 

"Out of the next 2^~^ bits ofx (i.e., those labeling leaf vertices in Tyi), a p fraction will be Is." 

To analyze S in this form, fix any binary sequence x. We consider the random walk performed in S to 
be extended to an unbiased random walk of length n. The walk terminates at some leaf vertex vz, where 
Z = {zi, . . . , Zn) is uniform over {0, 1}"^. 

For < i < n and y G {0, 1}*, define 

p{y) := 2^-" Yl 

t«e{o,i}"-' 

as the fraction of Is among the labels of leaf vertices of Ty. For < t < n, define the random variable 

X{t) := p{zi,. . .,zt), 

defined in terms of Z, where X{0) = p{%). The sequence X(0), . . . ,X{n) is a martingale; we follow 
a folklore technique by analyzing the squared differences between terms in the sequence. First, we have 
X{t) G [0, 1], so that (X(n) - ^(O))^ < 1. On the other hand, 

E[(X(n)-X(0))2]=E [ ^ 

\0<i<n 



Y {X{t + l)-X{t)) 



E 



Y {X{t + l)-X{t)f 



0<t<n 



+ E 



2 Yl iX{s + l)-X{s)){X{t + l)-X{t)) 

(9) 



0<s<t<n 



Now, for < s < t < n and for any outcome of the bits zi, . . . , zt (which determine X{s), X{s + I), 
and X{t)), we have 

E[{Xit + 1) - X{t))\zu. ..,zt]= E,,^^g{o,i}[(p(^i, • • • , zt+i)] - p{zi, ...,zt) 

= \ [p{zi,.. .,Zt,Q) +p(zi, ...,zt, 1)] - p{zi,.. .,Zt) 
= 0. 

Thus the second right-hand term in Eq. ^ is 0, and 

E[{X{n)-X{0)f]= Y E[{X{t + l)-X{t)f]. 



(10) 



0<t<n 



Next we relate this to the accuracy of our guess p. Let p* be the fraction of Is in Tyi, i.e., the quantity 
5 attempts to predict; note that p* and p are both random variables. From the definitions, we have 

p = p{YO), p* = p{Yl), X{n-R) = ^{p + p*). 
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Also, 

VI D , 1\ \ P if = 0, 

Xin - R + l) = < 

^ ^ \ p* if Zn^R+l = 1. 

Thus we have the identity 

(X(n -R+l)-X{n- R)f = \{p- P*f ■ 
Now, n — i? is is uniform over {0, 1, . . . , n — 1}, and independent of Z. It follows from Eq. (ITOl) that 

E[(X(n -R+l)- X(n - i?))^] = -E[(X(n) - X(0)f] < 1/n. 

n 

Combining, we have 

E[{p-p*f\<^/n. (11) 

On the other hand, 

E[{p-p*f]>^T[\p-p*\>e]-e'^. (12) 
Combining Eqs. ([TTI ) and ([T2l ). we obtain 

Pr[|p-p*| > e] < 4/(ne2) < 5, 

by our setting n = [4/ (5e^)] . This proves Theorem [T] 

6 Questions for Future Work 

1. Fix some p € [1/2, 1]; is there a satisfying characterization of the sets A C {0, 1}'^ for which some 
bit-prediction strategy (as defined in Section ITTI ) succeeds with probability > p against all x G A7 
Perhaps there is a characterization in terms of some appropriate notion of dimension, analogous to the 
gale characterizations of Hausdorff dimension MLutOBall and packing dimension IIAHLM07II . 

2. Could the study of computationally bounded bit-prediction strategies be of value to the study of com- 
plexity classes, by analogy to the study of computationally bounded gales in IILut03al [AHLMOTl and 
in related work? 

3. Find necessary and sufficient conditions on the set B of "infrequently visited" states, for the conclu- 
sion of Theorem [3] (in Section 1431 ) to hold. 

4. Our (5, e)-successful forecasting strategy in Section [5] always makes a forecast about an interval of 
bits within xi, . . . ,Xm., where m = 2^'^^ ^ \ \t would be interesting to know whether some al- 
ternative strategy could make forecasts within a much smaller interval — for instance, with m = 
poly((5~^, It would also be interesting to look at a setting in which the forecaster is allowed 
to make predictions about sets other than intervals. 
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